16 research outputs found

    A Result for Orthogonal Plus Rank-1 Matrices

    Full text link
    In this paper the sum of an orthogonal matrix and an outer product is studied, and a relation between the norms of the vectors forming the outer product and the singular values of the resulting matrix is presented. The main result may be found in Theorem 1

    Homography-Based Positioning and Planar Motion Recovery

    Get PDF
    Planar motion is an important and frequently occurring situation in mobile robotics applications. This thesis concerns estimation of ego-motion and pose of a single downwards oriented camera under the assumptions of planar motion and known internal camera parameters. The so called essential matrix (or its uncalibrated counterpart, the fundamental matrix) is frequently used in computer vision applications to compute a reconstruction in 3D of the camera locations and the observed scene. However, if the observed points are expected to lie on a plane - e.g. the ground plane - this makes the determination of these matrices an ill-posed problem. Instead, methods based on homographies are better suited to this situation.One section of this thesis is concerned with the extraction of the camera pose and ego-motion from such homographies. We present both a direct SVD-based method and an iterative method, which both solve this problem. The iterative method is extended to allow simultaneous determination of the camera tilt from several homographies obeying the same planar motion model. This extension improves the robustness of the original method, and it provides consistent tilt estimates for the frames that are used for the estimation. The methods are evaluated using experiments on both real and synthetic data.Another part of the thesis deals with the problem of computing the homographies from point correspondences. By using conventional homography estimation methods for this, the resulting homography is of a too general class and is not guaranteed to be compatible with the planar motion assumption. For this reason, we enforce the planar motion model at the homography estimation stage with the help of a new homography solver using a number of polynomial constraints on the entries of the homography matrix. In addition to giving a homography of the right type, this method uses only \num{2.5} point correspondences instead of the conventional four, which is good \eg{} when used in a RANSAC framework for outlier removal

    Embed Me If You Can: A Geometric Perceptron

    Full text link
    Solving geometric tasks involving point clouds by using machine learning is a challenging problem. Standard feed-forward neural networks combine linear or, if the bias parameter is included, affine layers and activation functions. Their geometric modeling is limited, which motivated the prior work introducing the multilayer hypersphere perceptron (MLHP). Its constituent part, i.e., hypersphere neuron, is obtained by applying a conformal embedding of Euclidean space. By virtue of Clifford algebra, it can be implemented as the Cartesian dot product of inputs and weights. If the embedding is applied in a manner consistent with the dimensionality of the input space geometry, the decision surfaces of the model units become combinations of hyperspheres and make the decision-making process geometrically interpretable for humans. Our extension of the MLHP model, the multilayer geometric perceptron (MLGP), and its respective layer units, i.e., geometric neurons, are consistent with the 3D geometry and provide a geometric handle of the learned coefficients. In particular, the geometric neuron activations are isometric in 3D. When classifying the 3D Tetris shapes, we quantitatively show that our model requires no activation function in the hidden layers other than the embedding to outperform the vanilla multilayer perceptron. In the presence of noise in the data, our model is also superior to the MLHP

    TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis

    Full text link
    Rotation invariance is an important requirement for the analysis of 3D point clouds. In this paper, we present a learnable descriptor for rotation- and reflection-invariant 3D point cloud analysis based on recently introduced steerable 3D spherical neurons and vector neurons. Specifically, we show the compatibility of the two approaches and apply steerable neurons in an end-to-end method, which both constitute the technical novelty. In our approach, we perform TetraTransform -- which lifts the 3D input to an equivariant 4D representation, constructed by the steerable neurons -- and extract deeper rotation-equivariant features using vector neurons. This integration of the TetraTransform into the VN-DGCNN framework, termed TetraSphere, inexpensively increases the number of parameters by less than 0.0007%. Taking only points as input, TetraSphere sets a new state-of-the-art performance classifying randomly rotated real-world object scans of the hardest subset of ScanObjectNN, even when trained on data without additional rotation augmentation. Additionally, TetraSphere demonstrates the second-best performance segmenting parts of the synthetic ShapeNet, consistently outperforming the baseline VN-DGCNN. All in all, our results reveal the practical value of steerable 3D spherical neurons for learning in 3D Euclidean space

    DeDoDe: Detect, Don't Describe -- Describe, Don't Detect for Local Feature Matching

    Full text link
    Keypoint detection is a pivotal step in 3D reconstruction, whereby sets of (up to) K points are detected in each view of a scene. Crucially, the detected points need to be consistent between views, i.e., correspond to the same 3D point in the scene. One of the main challenges with keypoint detection is the formulation of the learning objective. Previous learning-based methods typically jointly learn descriptors with keypoints, and treat the keypoint detection as a binary classification task on mutual nearest neighbours. However, basing keypoint detection on descriptor nearest neighbours is a proxy task, which is not guaranteed to produce 3D-consistent keypoints. Furthermore, this ties the keypoints to a specific descriptor, complicating downstream usage. In this work, we instead learn keypoints directly from 3D consistency. To this end, we train the detector to detect tracks from large-scale SfM. As these points are often overly sparse, we derive a semi-supervised two-view detection objective to expand this set to a desired number of detections. To train a descriptor, we maximize the mutual nearest neighbour objective over the keypoints with a separate network. Results show that our approach, DeDoDe, achieves significant gains on multiple geometry benchmarks. Code is provided at https://github.com/Parskatt/DeDoDe

    DKM: Dense Kernelized Feature Matching for Geometry Estimation

    Full text link
    Feature matching is a challenging computer vision task that involves finding correspondences between two images of a 3D scene. In this paper we consider the dense approach instead of the more common sparse paradigm, thus striving to find all correspondences. Perhaps counter-intuitively, dense methods have previously shown inferior performance to their sparse and semi-sparse counterparts for estimation of two-view geometry. This changes with our novel dense method, which outperforms both dense and sparse methods on geometry estimation. The novelty is threefold: First, we propose a kernel regression global matcher. Secondly, we propose warp refinement through stacked feature maps and depthwise convolution kernels. Thirdly, we propose learning dense confidence through consistent depth and a balanced sampling approach for dense confidence maps. Through extensive experiments we confirm that our proposed dense method, \textbf{D}ense \textbf{K}ernelized Feature \textbf{M}atching, sets a new state-of-the-art on multiple geometry estimation benchmarks. In particular, we achieve an improvement on MegaDepth-1500 of +4.9 and +8.9 AUC@5∘@5^{\circ} compared to the best previous sparse method and dense method respectively. Our code is provided at https://github.com/Parskatt/dk

    RoMa: Revisiting Robust Losses for Dense Feature Matching

    Full text link
    Dense feature matching is an important computer vision task that involves estimating all correspondences between two images of a 3D scene. In this paper, we revisit robust losses for matching from a Markov chain perspective, yielding theoretical insights and large gains in performance. We begin by constructing a unifying formulation of matching as a Markov chain, based on which we identify two key stages which we argue should be decoupled for matching. The first is the coarse stage, where the estimated result needs to be globally consistent. The second is the refinement stage, where the model needs precise localization capabilities. Inspired by the insight that these stages concern distinct issues, we propose a coarse matcher following the regression-by-classification paradigm that provides excellent globally consistent, albeit not exactly localized, matches. This is followed by a local feature refinement stage using well-motivated robust regression losses, yielding extremely precise matches. Our proposed approach, which we call RoMa, achieves significant improvements compared to the state-of-the-art. Code is available at https://github.com/Parskatt/RoM

    Trust Your IMU: Consequences of Ignoring the IMU Drift

    Full text link
    In this paper, we argue that modern pre-integration methods for inertial measurement units (IMUs) are accurate enough to ignore the drift for short time intervals. This allows us to consider a simplified camera model, which in turn admits further intrinsic calibration. We develop the first-ever solver to jointly solve the relative pose problem with unknown and equal focal length and radial distortion profile while utilizing the IMU data. Furthermore, we show significant speed-up compared to state-of-the-art algorithms, with small or negligible loss in accuracy for partially calibrated setups. The proposed algorithms are tested on both synthetic and real data, where the latter is focused on navigation using unmanned aerial vehicles (UAVs). We evaluate the proposed solvers on different commercially available low-cost UAVs, and demonstrate that the novel assumption on IMU drift is feasible in real-life applications. The extended intrinsic auto-calibration enables us to use distorted input images, making tedious calibration processes obsolete, compared to current state-of-the-art methods

    Planar Motion and Visual Odometry: Pose Estimation from Homographies

    No full text
    This thesis concerns ego-motion and pose estimation of a single camera under the assumptions of planar motion and constant internal camera parameters. Planar motion is common for cameras mounted onto mobile robots, particularly in indoor scenarios, as they remain at a constant height above the ground plane. In Paper A, a parametrisation of the camera motion and pose is presented, along with an iterative approach for determining the parameters. Paper B describes how to extend the method in Paper A to use more than one homography at a time in the estimation process, thereby improving the estimation accuracy and robustness. Paper C presents an alternative method for estimating the distance between camera positions that is independent of the estimated orientation of the cameras

    Ego-Motion Recovery and Robust Tilt Estimation for Planar Motion Using Several Homographies

    No full text
    In this paper we suggest an improvement to a recent algorithm for estimating the pose and ego-motion of a camera which is constrained to planar motion at a constant height above the floor, with a constant tilt. Such motion is common in robotics applications where a camera is mounted onto a mobile platform and directed towards the floor. Due to the planar nature of the scene, images taken with such a camera will be related by a planar homography, which may be used to extract the ego-motion and camera pose. Earlier algorithms for this particular kind of motion were not concerned with determining the tilt of the camera, focusing instead on recovering only the motion. Estimating the tilt is a necessary step in order to create a rectified map for a SLAM system. Our contribution extends the aforementioned recent method, and we demonstrate that our enhanced algorithm gives more accurate estimates of the motion parameters
    corecore